🤖 perf: smooth text streaming (kill cascade re-renders, model-aware reveal)#3219
🤖 perf: smooth text streaming (kill cascade re-renders, model-aware reveal)#3219
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0a945ed7bc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Streaming TPS pill could briefly show inflated values at the start of a new stream then 'drop abruptly' to a real value, looking like cached stale data. Two root causes: 1. calculateTPS divided by (now - firstDelta.timestamp). For a brand new stream's first delta that span is just a few ms — e.g. '50 tokens / 0.005s = 10000 t/s'. As more deltas accumulate the window broadens and TPS settles, hence the visible drop. Phase 1's microtask cadence exposed this where the prior idle-callback batching used to mask it. Floor the divisor to a 1s minimum window so the rate smoothly ramps up over the first second of a stream instead of spiking. Underestimation during the settling window is acceptable; order-of-magnitude overestimation isn't. 2. The stream-error event handler in applyWorkspaceChatEventToAggregator didn't call clearTokenState, leaving the errored message's deltaHistory entry to leak. Match stream-end / stream-abort and clear it so a follow-up stream starts with a clean slate. Adds tests for both behaviors.
Codex P1 on PR #3219: every TypewriterMarkdown instance was subscribing to useWorkspaceStreamingStats(workspaceId) regardless of streaming state. Long transcripts of completed assistant messages then re-rendered on every stream-delta of the new live message, undoing part of the cascade-rerender fix. Subscribe to the real workspace key only while the message is actively streaming; completed messages subscribe to the empty-string sentinel which is never bumped. Hook still runs unconditionally per rules of hooks — only the key changes.
|
@codex review Addressed your P1: Added regression tests in |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ff12fced30
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Codex P2 on PR #3219: streamingStatsStore was bumped on stream-end and stream-abort but not on stream-error. Subscribers (TypewriterMarkdown via useWorkspaceStreamingStats) could keep returning the failed stream's TPS/charsPerSec until the next delta arrived, leaking stale rates into the next stream's early renders. Mirror the stream-end / stream-abort terminal-cleanup pattern in the stream-error path: cancel any pending coalesced bumps, then bump streamingStatsStore so consumers re-read and the snapshot collapses to null (getActiveStreamMessageId is already undefined post-error). Adds a regression test that drives stream-start + stream-delta + caught-up to populate the cache, then asserts both that subscribers are notified and that the post-error snapshot is null.
|
@codex review Addressed your P2: |
|
Codex Review: Didn't find any major issues. Nice work! ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
Upstream regression: @smithy/util-retry@4.3.7 was published at
2026-05-02T04:09:26Z with workspace:^ refs that escaped the smithy
monorepo. Any lockfile-free bun install (used by both
scripts/check-bench-agent.sh and scripts/smoke-test.sh — which mimic
'bun x mux@latest' for end users) fails to resolve those refs.
Reproduction outside this repo:
mkdir t && cd t
echo '{"dependencies":{"@aws-sdk/credential-providers":"^3.940.0"}}' > package.json
bun install --ignore-scripts
# → 'Workspace dependency "@smithy/types" not found'
This breaks every PR opened/pushed after 04:09Z (PR #3219 was lucky to
merge ~30 min before). Add an npm-style overrides pin to <=4.3.6 (the
last known-good release, 2026-04-28) until smithy republishes.
Verified locally: 'bash scripts/check-bench-agent.sh' now passes, and
@smithy/util-retry resolves to 4.3.6 in a fresh lockfile-free install.
Summary
Streamed assistant text (and reasoning) was visibly jittery — periodic catch-up jumps every few seconds, rate stuck at ~72 chars/sec regardless of what the model emitted, and a sub-frame of work for the entire chat list on every delta. This PR makes the cadence smooth in three ordered fixes plus a TPS-display fix discovered during review: leaf-subscribe the streaming-stats pill so it stops invalidating
WorkspaceState, replace the smoothing engine's hard-snap with a model-aware soft catch-up, compact streaming parts on append, and floor the TPS calculator's time span so a new stream's first deltas don't spike the displayed rate.Background
The renderer has had a two-clock smoothing model (
SmoothTextEngine+useSmoothStreamingText) for a while, but several regressions defeated it:WorkspaceState.streamingTokenCount/streamingTPSwere computed inside thegetWorkspaceStatesnapshot usingDate.now(). Every coalesced delta produced a new snapshot reference, which cascadedWorkspaceShell → ChatPane → MessageRendererthrough every row.useDeferredValuewas bypassed for the entire stream byshouldBypassDeferredMessages, so reconciliation ran at the ingestion rate.getAdaptiveRate(backlog)ignored the model's actual emission rate. With a fast model (~120 cps) andBASE_CHARS_PER_SEC=72, the visible cursor fell behind by ~5 chars per ingestion cycle until backlog crossedMAX_VISUAL_LAG_CHARS=120, at which pointenforceMaxVisualLagsnappedvisible := full - 120and zeroed the budget — that snap is exactly the visible "catch-up jump".requestIdleCallback({ timeout: 100 })was used for streaming deltas. The smoothing engine should be the only pacing layer; idle batching just feeds (2).handleStreamDeltaappended a fresh{ type: "text" }part per chunk;mergeAdjacentPartsre-merged on every render. For a 10k-char reply that's tens of thousands of merges per turn.calculateTPSdivided bynow - firstDelta.timestamp. With one delta that span is typically a few milliseconds, so e.g.50 tokens / 0.005s = 10000 t/s. Phase 1's microtask cadence exposed this — where the prior idle-callback batching used to mask it by sampling later — and Phase 2 wired TPS into the smoothing engine, amplifying its visibility.Implementation
Four commits, ordered so each phase is verifiable in isolation:
Phase 1 — leaf-subscribe streaming stats, microtask ingestion (
775e9023c)streamingTokenCount/streamingTPSfromWorkspaceState.WorkspaceStreamingStats+streamingStatsStore(MapStore) +useWorkspaceStreamingStats(workspaceId)leaf hook (mirrors the existinguseWorkspaceStatsSnapshotpattern atWorkspaceStore.ts:4127).scheduleIdleStateBumpwithscheduleStreamingStateBumpfor streaming delta types (stream-delta,tool-call-delta,reasoning-delta). It coalesces onqueueMicrotaskinstead of an idle callback.init-outputandbash-outputkeep the idle path (terminal-style throughput).cancelPendingStreamingBumpinto stream-end / stream-abort / replay reset /removeWorkspace.StreamingBarriernow reads via the leaf hook.Phase 2 — model-aware smoothing engine, soft catch-up (
85fb141da)SmoothTextEngine.update()accepts an optionalliveCharsPerSec.getAdaptiveRate(backlog, liveCps)combines a steady-state floor (max(BASE, liveCps)), a soft catch-up ramp that drains lag overSOFT_CATCHUP_DRAIN_MSonce it exceedsSOFT_CATCHUP_LAG_CHARS=60, and the legacy backlog-pressure ramp (kept as upper bound).MAX_VISUAL_LAG_CHARSis now 1024 (was 120) — a defensive safety net for paused-tab pathological bursts that normal streams never hit.MIN_FRAME_CHARSfrom 1 to 2 so reveals coalesce to ~30 Hz at the BASE rate (half the markdown re-parse cost; humans can't see the difference). Tail-end reveal still works because the gate is nowmin(MIN_FRAME_CHARS, backlog).useSmoothStreamingTextandTypewriterMarkdownthreadliveCharsPerSecthrough;TypewriterMarkdownaccepts a newworkspaceIdprop, forwarded fromAssistantMessageandReasoningMessage(viaMessageRenderer).Phase 3 — compact-on-append, clean prop surface (
0a945ed7b)StreamingMessageAggregator.handleStreamDelta/handleReasoningDeltaappend into the previous adjacent text/reasoning part in place. For a 10k-char reply this dropsparts.lengthfrom thousands to one andmergeAdjacentPartscost from O(N) to O(1). Backend persistence (partial.json,chat.jsonl) is unaffected — those writers live backend-side; this aggregator'spartsis pure display state.TypewriterMarkdown: dropped thedeltas: string[]shape (always passed as[content]literal — defeatedReact.memo) forcontent: string. Removed the manualReact.memoand the inneruseMemofor the streaming-context value (React Compiler handles both).Phase 4 — TPS calculator floor + stream-error token cleanup (
a476613be)calculateTPSnow floors the divisor atMIN_TPS_TIME_SPAN_MS = 1000. With one delta the rate becomestokens / 1sinstead oftokens / 0.005s. The reported TPS smoothly ramps up over the first second of a stream instead of spiking and "dropping abruptly". Slight under-statement during the settling window is the trade-off — strictly preferable to an order-of-magnitude over-statement.stream-errorbranch inapplyWorkspaceChatEventToAggregatornow callsclearTokenState, matchingstream-endandstream-abort. Without it, the errored message'sdeltaHistoryentry leaks into a follow-up stream's TPS calculation.Validation
make typecheck✅make lint✅SmoothTextEngine,useSmoothStreamingText,StreamingMessageAggregator,applyWorkspaceChatEventToAggregator,StreamingTPSCalculator,TypewriterMarkdown,ReasoningMessage,StreamingBarrier{,View},PinnedTodoList,WorkspaceStore, plus the broadersrc/browser/utils/messages/,src/browser/features/Messages/,src/browser/stores/, andsrc/browser/hooks/suites.SmoothTextEngine.test.ts: rate tracksliveCharsPerSec; soft catch-up engaged for 60–1024 char lags without snap; hard snap still fires above the safety threshold.StreamingTPSCalculator.test.ts: 1s floor applied for tiny / zero spans; raw span used once it exceeds the floor; negative spans (clock skew) return 0.applyWorkspaceChatEventToAggregator.test.ts:stream-errorcallsclearTokenState.Risks
Localized to the streaming display path; no protocol or persistence changes.
WorkspaceStateonce per microtask drain instead of once perrequestIdleCallback. Net effect under heavy load is less work because the snapshot stops invalidating per-delta TPS, but it's a behavioral shift — verified via the existing 106-testWorkspaceStoresuite plus targetedStreamingBarriertests.MAX_VISUAL_LAG_CHARSjumped 120 → 1024 andMIN_FRAME_CHARS1 → 2. Existing test "caps visual lag when incoming text jumps ahead" still passes against the new soft-ramp behavior, and the new "hard-snaps when lag exceeds the safety threshold" test confirms the safety net still functions.partsarray shape during streaming. The aggregator already had compaction at stream-end (compactMessageParts); we're just doing it eagerly. No on-disk format change. AllStreamingMessageAggregatorandapplyWorkspaceChatEventToAggregatortests pass.sessionTimingServicealso callscalculateTPS; same floor applies there but the backend's window is broader so the visible effect is smaller. No risk to persisted usage / cost calculations — those useusage.outputTokens / durationfrom the API, not the streaming TPS estimator.Generated with
mux• Model:anthropic:claude-opus-4-7• Thinking:xhigh• Cost:$23.55